{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 使用vLLM进行Yi-1.5-6B-Chat模型的推理\n",
    "\n",
    "欢迎来到本教程！在这里，我们将指导您如何使用vLLM进行Yi-1.5-6B-Chat模型的推理。vLLM是一个快速且易于使用的大型语言模型（LLM）推理和服务库。让我们开始吧！"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🚀 在Colab上运行\n",
    "\n",
    "我们还提供了一键运行的[Colab脚本](https://colab.research.google.com/drive/1KuydGHHbI31Q0WIpwg7UmH0rfNjii8Wl?usp=drive_link)，让开发变得更简单！"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 安装\n",
    "\n",
    "首先，我们需要安装相关的依赖。根据官方文档要求，使用pip安装vLLM需要CUDA 12.1。您可以参考官方[文档](https://docs.vllm.ai/en/stable/getting_started/installation.html)获取更多详情。\n",
    "\n",
    "现在让我们安装vLLM："
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "installation"
   },
   "source": [
    "!pip install vllm"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 加载模型\n",
    "\n",
    "接下来，我们将加载Yi-1.5-6B-Chat模型。请注意电脑的显存和硬盘占用情况。如果出现错误，可能是由于资源不足引起的。\n",
    "\n",
    "本教程使用Yi-1.5-6B-Chat模型。以下是该模型的显存和硬盘占用情况：\n",
    "\n",
    "| 模型 | 显存使用 | 硬盘占用 |\n",
    "|-------|------------|------------------|\n",
    "| Yi-1.5-6B-Chat | 21G | 15G |"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "load-model"
   },
   "source": [
    "from transformers import AutoTokenizer\n",
    "from vllm import LLM, SamplingParams\n",
    "\n",
    "# 加载分词器\n",
    "tokenizer = AutoTokenizer.from_pretrained(\"01-ai/Yi-1.5-6B-Chat\")\n",
    "\n",
    "# 设置采样参数\n",
    "sampling_params = SamplingParams(\n",
    "    temperature=0.8, \n",
    "    top_p=0.8)\n",
    "\n",
    "# 加载模型\n",
    "llm = LLM(model=\"01-ai/Yi-1.5-6B-Chat\")"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型推理\n",
    "\n",
    "现在，让我们准备一个提示词模版并使用模型进行推理。在这个例子中，我们将使用一个简单的问候语提示词。"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "inference"
   },
   "source": [
    "# 准备提示词模版\n",
    "prompt = \"你好！\"  # 根据需要更改提示词\n",
    "messages = [\n",
    "    {\"role\": \"user\", \"content\": prompt}\n",
    "]\n",
    "text = tokenizer.apply_chat_template(\n",
    "    messages,\n",
    "    tokenize=False,\n",
    "    add_generation_prompt=True\n",
    ")\n",
    "print(text)\n",
    "\n",
    "# 生成回复\n",
    "outputs = llm.generate([text], sampling_params)\n",
    "\n",
    "# 打印输出\n",
    "for output in outputs:\n",
    "    prompt = output.prompt\n",
    "    generated_text = output.outputs[0].text\n",
    "    print(f\"Prompt: {prompt!r}, Generated text: {generated_text!r}\")\n",
    "# 期望的回复：\"你好！今天见到你很高兴。我能为你做些什么呢？\""
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "就是这样！您已经成功地使用vLLM进行了Yi-1.5-6B-Chat模型的推理。请随意尝试不同的提示词并调整采样参数，看看模型会如何响应。祝您实验愉快！"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
